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Interrogation is a crucial step in the investigation of criminal acts. Artificial 
intelligence has been used to increase the efficiency of interrogation. In this 
study, we developed a confession probability identification system to help 
investigators analyze the emotions of their interrogees while they are 
answering questions and determine the probability of them confessing. 
Based on these analysis results along with their own experience, 
investigators may adjust the content and direction of their interrogations to 
penetrate the interrogees’ defenses. The proposed system uses OpenFace and 
FaceReader to capture data and incorporates the multi-grained cascade forest 
(gcForest) and long short-term memory (LSTM) algorithms for deep 
learning. Our results indicated that the recognition accuracy of the gcForest 
algorithm exceeded that of the LSTM algorithm, which is consistent with the 
fact that the gcForest algorithm is more suitable for smaller sample sizes. In 
addition, heart-rate-based assessment may lead to erroneous determination 


of whether an interrogatee is telling the truth or lies because their heart rate 
may increase as a result of emotional responses. 


This is an open access article under the CC BY-SA license. 


Corresponding Author: 


Yi Chang Wu 

Forensic Science Division, Investigation Bureau, Ministry of Justice 
No. 74, Zhonghua Rd., Xindian Dist., New Taipei City 231, Taiwan 
Email: shintenwu @ gmail.com 


1. INTRODUCTION 

During a case investigation, the law enforcement agency notifies relevant individuals to appear for 
interrogation to clarify certain details and collect relevant evidence. Through this interrogation process, the 
truth of the incident can be clarified by the interrogees, and further information and leads can be gleaned 
from their statements. However, even though the interrogees are not necessarily the perpetrators, they may 
conceal the truth for certain reasons, especially if they are the actual perpetrators. In this case, the 
interrogators must rely on their past experience, observe and make judgments depending on the tone and 
attitude of the interrogees, and, if necessary, conduct polygraph tests as a reference. However, polygraph 
testing requires professional personnel and the consent of the interrogee, which is time-consuming and may 
result in a missed opportunity to collect evidence. Additionally, some objections have been raised against the 
use of polygraph results in judicial trials. Consequently, law enforcement agencies no longer consider 
polygraph results as valid evidence in court. 

Law enforcement agencies primarily focus on clarifying cases and obtaining evidence during 
investigations. Physical evidence, documentary evidence, and relevant individuals are interconnected. 
Investigators must utilize existing clues to reconstruct the incident. Many breakthroughs are achieved through 
the statements of the interrogees. Therefore, interrogating relevant individuals is essential to gather additional 
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information about the case. In 2016, Keli [1] investigated the influence ratios (with vs. without influence) of 
17 factors affecting a suspect’s decision to confess voluntarily as shown in Table 1. They reported that the 
fear of being caught lying ranked fifth. However, when the situation in which the investigators have already 
obtained concrete evidence of crime was excluded, the same factor ranked fourth. These results suggest that 
if the lies of the interrogees can be immediately detected during the interrogation, then 75% of the interrogees 
are expected to tell the truth. Such voluntary confessions from the actual perpetrators greatly affect the 
investigation. 


Table 1. Factors affecting the voluntary confession of a suspect 


Influence ratio 
Factor 


Yes No 
1 The investigators already had evidence of the crime, so resistance was meaningless. 17 3 
2 Others involved in the case have already confessed, so I did to. 12 8 
3 I confessed to receive a lenient punishment. 18 2 
4 I must take responsibility for my own actions. 9 11 
5 I was caught lying, so I had to tell the truth. 15 5 
6 I felt that it is better to confess than to remain silent. 17 3 
7 I confessed to thank the investigators for their assistance. 7 13 
8 I confessed to leave this place (interrogation room) as soon as possible. 16 4 
9 I felt a pressure to confess. 14 6 
10 I told part of the truth and concealed other parts to cover up other crimes. 12 8 
11 My actions will eventually be revealed, so it is better to confess early. 10 10 
12 I confessed to show that I was willing to cooperate. 15 5 
13 I do not know why I impulsively confessed. 4 16 
14 I cannot conceal the truth, so I have no choice but to confess. 14 6 
15 The investigators were kind to me, so I confessed. 6 14 
16 I confessed to get released on bail sooner. 14 6 
17__I confessed because I thought the investigation unit had evidence of my crime, which turned out to be untrue. 7 13 


With the continuous advancement of technology, digital imaging systems and big data have found 
extensive applications in the field of artificial intelligence (AI) recognition. Numerous studies have focused 
on the applications of image recognition, including facial and speech recognition for Internet of things (IoT), 
virtual reality, medicine, and license plate recognition [2]-[11]. These applications have substantially 
expanded and enriched various aspects of life. 

Emotion recognition is a popular area in facial image recognition that is extensively used in fields 
such as telecommunication, gaming, animation, psychiatry, automotive safety, and computer-based 
educational systems [12]. Initially, emotion recognition training relied on manual labeling. However, with the 
integration of systems and information technology, machine learning techniques have gained prominence, 
leading to the development of deep learning techniques [13], [14]. The advantage of deep learning lies in its 
ability to achieve training and verification with a small sample size within a short time [15]. 

In some countries, law enforcement agencies have incorporated facial recognition into the 
interrogation process to reduce racial and gender bias [16]. In this study, we used the digital cameras and 
computers that are already installed in interrogation rooms for protecting the rights of the interrogees to 
gather objective reference information for investigators by using a confession probability identification 
system. Our system is strategically integrated into preexisting cameras and computers, ensuring that it 
remains inconspicuous and minimally intrusive. This approach reduces the likelihood of arousing suspicion 
among the interrogees and facilitates the capturing of their genuine emotions without being detected. 

Although polygraph testing and AI image recognition operate on distinctly different principles, they 
both involve predictions based on individual behavior. Therefore, to realize a more effective and legally 
sound interrogation method, we attempted to replace conventional polygraph testing with noncontact 
methods and provide real-time assistance during the investigation process by leveraging the computational 
and logical capabilities of deep learning algorithms. The rest of this paper is organized as follows. Section 2 
introduces the research methods utilized, including how data were retrieved and how useful information was 
extracted for further analysis. Section 3 presents the results and discussion. Finally, section 4 outlines our 
conclusions. 


2. METHOD 
To find solutions, investigators must analyze the deceptive information gathered from the responses 
of the interrogees to elicit a confession. However, given the need to scrutinize a large amount of relevant 
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information within a limited time frame, lowering the threshold for identification is crucial to avoid 
overlooking critical details in investigations. With the popularization of digital imaging systems, image 
recognition has witnessed substantial advancements through machine learning in AI. In addition, with the 
advent of advanced deep learning techniques, time-consuming manual labeling processes, and slow machine 
learning processes have been revolutionized into a new computational process [14], [17]. Deep learning can 
autonomously derive features and learn independently from only the raw data provided by the operators. 

Conventional polygraph testing primarily relies on physiological responses, such as blood pressure, 
brainwaves, and heart rate, to determine whether the examinees are being truthful [11]. Noncontact 
polygraph testing incorporates parameters such as skin color to determine the examinee’s blood pressure and 
heart rate; other parameters adopted include voiceprint, facial expressions, language patterns, and eye 
movements [11], [18], [19]. Unlike conventional polygraph testing, whose accuracy may be undermined 
when the examinee is nervous or adopts countermeasures, noncontact polygraph testing reduces the influence 
of said situations and has therefore become mainstream [20]. 

Since the introduction of the Facial Action Coding System in 1976, which enabled the identification 
of microexpressions [21], and the emergence of deep learning in AI, substantial advancements have been 
made in the field of noncontact polygraphy. Currently, various deep learning frameworks, such as deep 
neural networks (DNNs), convolutional neural networks, and recurrent neural networks (RNNs), are used in 
image recognition, speech recognition, and bioinformatics. In this study, we integrated microexpression 
recognition, photoplethysmography (PPG), and deep learning, namely multi-grained cascade forest 
(gcForest) and long short-term memory (LSTM), into a confession probability identification system to 
conduct real-time tests on the probability of lies and confessions. Figure 1 depicts the framework of the 
proposed system. Overall, the system enabled noncontact detection and prompt notification during 
interrogations, thereby facilitating immediate detection and presentation of results. 
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Figure 1. Process of the confession probability identification system 


2.1. Microexpressions 

Microexpressions are brief unconscious facial expressions that occur when an individual attempts to 
conceal certain emotions. The facial action coding system is a widely used protocol that identifies and labels 
facial expressions by describing the movement of facial muscles. It objectively measures the frequency and 
intensity of facial expressions and analyzes emotions. In this protocol, facial expressions are divided into 
action units (AUs), each of which displays changes in a certain facial characteristic (e.g., raised eyebrows, 
wrinkled nose). This protocol has been used in psychological research to address various research questions 
pertaining to socioemotional development, neuropsychiatric disorders, and deception. Through AI facial 
recognition and model learning techniques, AUs can be immediately labeled and used to penetrate the 
defenses of interrogees [22], [23]. 


2.2. Remote PPG 

PPG is an optical technology used for measuring biomedical signals to analyze human skin at a low 
cost and interpret pulse information through skin reflectance variations due to changes in blood volume. 
Remote PPG is an advanced noncontact method for examining the skin surface of the face. When the face is 
properly illuminated, changes in blood volume due to pulse pressure can be detected, and the amount of 
reflected light can be measured. With reflectance mapped over time, each cardiac cycle is displayed as a 
peak. The data can then be converted into average heart rate and variability [24], [25]. 
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2.3. Deep learning 

Despite its wide use in various fields, emotion recognition remains an unresolved problem. Deep 
learning is a framework based on artificial neural networks. A deep learning algorithm is an algorithm that 
learns features from data. It mimics the functions of the human brain to represent complex data from real- 
world scenarios and to facilitate informed decision-making. Deep learning has been widely used in the field 
of computer vision, including in image classification and object detection. It has also been used in biometrics 
to represent unique biometric data and enhance the performance of many identity verification and recognition 
systems, thus increasing the accuracy of facial recognition. 

DNNs are simply regarded as stacks of multiple layers of nonlinear functions. In situations where 
one wishes to eliminate manual determination of the nonlinear mapping relationship between two objects or 
where the relationship is difficult to determine, additional layers can be stacked to allow the machine to learn 
the relationship on its own, which is the original idea behind deep learning. Figure 2 shows the difference 
between a simple neural network and a DNN [26]. A simple neural network has a single hidden layer, 
whereas a DNN has two or more hidden layers. 
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Figure 2. Difference between a simple neural network and a DNN 


2.3.1. gcForest 

Originally developed by Zhou and Feng [15], gcForest was established by stacking multiple layers 
of random forests in a cascading manner to achieve advanced feature representation and high learning 
performance. Unlike DNNs, gcForest requires a small volume of training data to achieve satisfactory 
performance. In addition, because it includes fewer hyperparameters, it does not require extensive tuning. It 
also incorporates adaptive tree-structured clustering, which reduces the need for a large number of 
computational resources and facilitates the training process [27]. The source code of gcForest is publicly 
available. The model is currently designed for labeled sequence data with a length of 80. Hence, gcForest can 
detect 20 microexpressions and 60 movements exhibited by interrogees each time they are questioned, with 
an output of either “Truth” or “Lie.” 


2.3.2. LSTM 

LSTM is an RNN that was first introduced in 1997. It addresses the problems of long-term memory 
and vanishing or exploding gradients in RNNs [28]. As a nonlinear model, LSTM serves as a complex 
nonlinear unit for constructing larger DNNs. It has already been used in multiple fields [29]—[31]. 

In this study, three datasets were used for model training, namely the Real-Life Trial dataset [32], 
the Miami University Deception Detection Database [33], and the Bag-of-Lies dataset [34]. The Real-Life 
Trial dataset contains actual high-risk courtroom videos, whereas the Miami University Deception Detection 
Database and Bag-of-Lies dataset contain experiment low-risk videos filmed under laboratory conditions. 
The level of risk refers to the responsibility and potential consequences that interrogees may face as a result 
of lying. A high risk indicates that if an interrogee lies, they may face real-life consequences such as criminal 
charges and imprisonment. 

The confession probability identification system was trained during the preprocessing stage. The 
videos in the datasets were split into frames at a rate of 30 frames per second. Subsequently, the datasets 
were divided at a ratio of 7:3, with 70% of the samples used for training and the remaining 30% used for 
testing the training results [35]. Facial landmark detection was then performed using a constrained local 
neural field model. The model provided more than 700 features, with 35 related to facial AUs. However, 
because the p-values of AUO1_r, AU23_r, and AU17_C were below the required threshold, the three AUs 
that were presumed to lead to false judgments were excluded to increase the identification success rate [35]. 
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3. RESULTS AND DISCUSSION 

The purpose of an interrogation is to compel the interrogees to tell the truth or glean information 
from their statements to clarify a case or uncover leads. The key to solving many cases lies in the statements 
provided by the interrogees. Therefore, interrogation is an essential process for investigation units to reveal 
criminal facts. Although the interrogees are not necessarily the perpetrators, they may conceal the truth for 
certain reasons. However, lies can also serve as leads. In this study, we developed a confession probability 
identification system to help investigators detect lies immediately during an interrogation and penetrate the 
interrogees’ defenses to elicit a confession. 

Figure 3 shows the deep learning framework of the proposed system, which uses OpenFace to 
capture facial AU signals and FaceReader to detect heart rate variability through facial skin analysis. Deep 
learning algorithms, namely gcForest and LSTM, were used to estimate the probability of confession on the 
basis of weighted factors, such as truth, lies, emotions, heart rate, and heart rate variability (the root mean 
square of successive differences between normal heartbeats and the standard deviation of the interbeat 
intervals of normal sinus beats). An initial value of 100% indicates the highest level of credibility. As the 
interrogation proceeds, the credibility of an interrogee’s statement may decrease to 0%. 


Deep learning network 


Take 30 frames as a block 


Figure 3. AI learning framework of the confession probability identification system 


To differentiate between participants telling the truth and those telling lies, we conducted a pretest to 
observe the variations in heart rate and distribution of emotions. For this test, we used the Miami University 
Deception Detection Database [33] to determine the maximum heart rate of the participants both while telling 
the truth and while telling lies as shown in Figure 4. The results revealed an increase in heart rate when the 
participants told lies. 
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Figure 4. Maximum heart rates of the participants while telling the truth and while telling lies 


We also used the Miami University Deception Detection Database [33] to investigate the 
distribution of emotions. After examining the maximum intensities of facial expressions as shown in 
Figure 5, we discovered that the participants exhibited more intense facial expressions associated with 
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sadness, anger, and contempt when telling the truth, while when telling lies, their facial expressions 
associated with surprise and fear were more intense. 
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Figure 5. Maximum intensities of facial expressions: truth (pink) and lies (green) 


After completing the microexpression recognition system, we used gcForest and LSTM to analyze 
the three datasets. Table 2 lists the recognition accuracies for the three datasets. Notably, the recognition 
accuracy of gcForest exceeded that of LSTM for the three datasets. These recognition results are consistent 
with the fact that the gcForest algorithm is more suitable than the LSTM algorithm for small sample sizes. 


Table 2. Recognition accuracies of the three datasets 
Miami University Deception 


Algorithm Real-Life Trial Dataset Detection Database Bag-Of-Life Dataset 
LSTM 84% 14% 63% 
gcForest 95% 91% 88% 


In this study, we used FaceReader to detect changes in facial skin color and thereby determine heart 
rate variability. Using FaceReader, we processed a large number of videos in batches and automatically 
generated charts through label classification. However, FaceReader required an authorized dongle for 
activation. In addition, complete (frontal) facial expression frameworks without hat or mask obstruction were 
required during the recognition process. Therefore, we selected five videos with complete facial expressions 
of people telling the truth and lies each for comparison and analyzed them against the heart rate and 
confession probability results. In other words, we compared the heart rate results obtained when FaceReader 
was activated versus when it was deactivated. The comparison results for truth and lie videos are presented in 
Figures 6 and 7, respectively. 
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Figure 6. Comparison results for participants telling the truth 
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Figure 7. Comparison results for participants telling lies 


The top portion of the chart depicts heart rate variability, whereas the bottom portion presents the 
recognition results of the system. The leftmost side of the chart represents the intensity of emotions, whereas 
the rightmost side represents the initial credibility level, set at 100%. Credibility changed with truthful and 
deceptive responses. It decreased when the interrogatee told a lie and increased when they told the truth. 
According to the comparison charts, the heart rate observed with truth responses ranged between 50 and 80 
beats per minute (BPM), whereas that observed with lie responses ranged between 70 and 100 BPM. At some 
points, the heart rate ranges overlapped, but no distinct peaks were observed. In terms of heart rate changes 
associated with truth responses, Participant 3 exhibited a relatively elevated heart rate due to a sad emotional 
response. According to these detection results, no substantial heart rate changes were noted in the five videos. 
Furthermore, the elevated heart rates caused by participants’ emotional responses hindered the determination 
of whether they were telling the truth or lies depending only on their heart rate. 


4. CONCLUSION 

In this study, we upgraded existing hardware and employed AI to develop a system that can provide 
interrogators with objective references to uncover the truth in interrogations. Given its ability to rapidly 
process big data, our system can aid in expediting investigations, thereby realizing the essence of judicial 
fairness. Our experimental results indicated that gcForest is more suitable than LSTM for small sample sizes. 
However, our datasets primarily contained short laboratory videos. In addition, other factors, such as the 
number of test samples and whether the participant was wearing makeup, may have influenced the training 
and recognition results. Therefore, further training with additional videos is required to increase the feature 
extraction accuracy of the proposed system. 
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